Note that I was able to reproduce this result with ChatGPT (not Plus, to be clear) without too much trouble. So at least in this case, I don’t think this is an example of something beyond GPT-3.5—which is good, because writing slightly modified quines like this isn’t something I would have expected GPT-3.5 to have trouble with!
(When I say “without too much trouble”, I specifically mean that ChatGPT’s initial response used the open(sys.argv[0]) method to access the file’s source code, despite my initial request to avoid this kind of approach. But when I pointed out that this approach violated one of the constraints, and prodded it to try again, it did in fact successfully produce a version of the script without this issue.)
(In fact, because I regenerated its response several times out of curiosity, it produced multiple such scripts, some with the same approach used by GPT-4 above, and other times with a more sophisticated approach using inspect.getsource(sys.modules[__name__]) instead. So I really think GPT-3.5 isn’t being given enough credit here!)
On the other hand, of course, that’s not to say that GPT-4 doesn’t represent a substantial capability improvement over GPT-3.5; that much should be obvious from its performance charts. Specifically, I think GPT-4 has in fact managed to acquire much better “board vision” than GPT-3.5 in various domains, of the sort I claimed in this comment and this one (see also the top-level post itself, whose overall thrust I continue to broadly agree with, even as I note places where the state of the art is being pushed forward).
(Full disclosure: I do think that GPT-4′s performance in chess, specifically, is a place where my models concede Bayes points, as even though I didn’t explicitly predict that GPT-4 wouldn’t improve at chess in either of the two linked comments (in fact, I specifically went out of my way to note that I was uncertain), it remains true that my model permitted both worlds in which GPT-4 did and did not become much better at chess than GPT-3.5, and was hence surprised by the narrowing of those worlds to one and not the other. I say this, even as I now go on to say (perhaps frustratingly!) that I don’t think my high-level picture has shifted much thanks to these observations; I did, in fact, expect to receive evidence on that front, just not in this specific form.)
Note that I was able to reproduce this result with ChatGPT (not Plus, to be clear) without too much trouble. So at least in this case, I don’t think this is an example of something beyond GPT-3.5—which is good, because writing slightly modified quines like this isn’t something I would have expected GPT-3.5 to have trouble with!
(When I say “without too much trouble”, I specifically mean that ChatGPT’s initial response used the
open(sys.argv[0])
method to access the file’s source code, despite my initial request to avoid this kind of approach. But when I pointed out that this approach violated one of the constraints, and prodded it to try again, it did in fact successfully produce a version of the script without this issue.)(In fact, because I regenerated its response several times out of curiosity, it produced multiple such scripts, some with the same approach used by GPT-4 above, and other times with a more sophisticated approach using
inspect.getsource(sys.modules[__name__])
instead. So I really think GPT-3.5 isn’t being given enough credit here!)On the other hand, of course, that’s not to say that GPT-4 doesn’t represent a substantial capability improvement over GPT-3.5; that much should be obvious from its performance charts. Specifically, I think GPT-4 has in fact managed to acquire much better “board vision” than GPT-3.5 in various domains, of the sort I claimed in this comment and this one (see also the top-level post itself, whose overall thrust I continue to broadly agree with, even as I note places where the state of the art is being pushed forward).
(Full disclosure: I do think that GPT-4′s performance in chess, specifically, is a place where my models concede Bayes points, as even though I didn’t explicitly predict that GPT-4 wouldn’t improve at chess in either of the two linked comments (in fact, I specifically went out of my way to note that I was uncertain), it remains true that my model permitted both worlds in which GPT-4 did and did not become much better at chess than GPT-3.5, and was hence surprised by the narrowing of those worlds to one and not the other. I say this, even as I now go on to say (perhaps frustratingly!) that I don’t think my high-level picture has shifted much thanks to these observations; I did, in fact, expect to receive evidence on that front, just not in this specific form.)