Remember, you only have about 1 second to think before outputting each token.
I don’t think this is true. Humans can decide to think longer on harder problems in a way GPT-3 can’t. Our “architecture” is fundamentally different from GPT-3 in that regard.
Also, our ability to think for longer fundamentally changes how we do concept extrapolation. Given a tricky extrapolation problem, you wouldn’t just spit out the first thing to enter your mind. You’d think about it.
If GPT-3 has an architectural limitation that prevents it from doing concept extrapolation in a human-like manner, we shouldn’t change our evaluation benchmarks to avoid “unfairly” penalizing GPT-3. We should acknowledge that limitation and ask how it impacts alignment prospects.
It sounds like we are on the same page. GPT-3 has an architectural limitation such that (a) it would be very surprising and impressive if it could make a coherent sentence out of reversed words, and (b) if it managed to succeed it must be doing something substantially different from how a human would do it. This is what my original point was. Maybe I’m just not understanding what point Stuart is making. Probably this is the case.
I don’t think this is true. Humans can decide to think longer on harder problems in a way GPT-3 can’t. Our “architecture” is fundamentally different from GPT-3 in that regard.
Also, our ability to think for longer fundamentally changes how we do concept extrapolation. Given a tricky extrapolation problem, you wouldn’t just spit out the first thing to enter your mind. You’d think about it.
If GPT-3 has an architectural limitation that prevents it from doing concept extrapolation in a human-like manner, we shouldn’t change our evaluation benchmarks to avoid “unfairly” penalizing GPT-3. We should acknowledge that limitation and ask how it impacts alignment prospects.
It sounds like we are on the same page. GPT-3 has an architectural limitation such that (a) it would be very surprising and impressive if it could make a coherent sentence out of reversed words, and (b) if it managed to succeed it must be doing something substantially different from how a human would do it. This is what my original point was. Maybe I’m just not understanding what point Stuart is making. Probably this is the case.