Uncompetitive programming with GPT-3

The recent DeepMind paper will presumably generate a lot of discussion, and I’m not an expert enough to completely understand the technicalities. But I still wonder how much of the breakthrough could be reduced to “tune GPT-3 for code generation instead of general text prediction”.

Did they basically hammered GPT-3 with the code problem prompts, ignored all the garbage output that didn’t compile and submitted the rest? (Update: No they didn’t; AlphaCode is not derived from GPT-3). I mean, they admittedly discard >99% samples:

We only keep samples that pass the example tests provided in the problem description. These tests are usually pretty short and simple, but enough to weed out programs that are completely incorrect. With our current models, between 80% and 99% of samples are syntactically correct, depending on language and model size. However, only 0.4% − 0.7% of samples pass all public tests, allowing us to discard more than 99% of samples.

Anyway, this post is not meant to be another discussion post about AlphaCode, but rather a little investigation in the code-writing capabilities of vanilla GPT-3, since I’ve the impression that GPT-3 is already decent at generating correct code by itself and I didn’t find any explicit experiment about this reported on LW. I’ve spent some minutes playing with the Eleuther UI (with default parameters) in order to generate some simple functions that would fit in its limited output size.

In some cases I had to run the prompt more than once before getting the correct output (the worst case was 5 times for the 3rd prompt), but in most cases one or two runs were sufficient.

My prompt are in bold, always including an initial comment to describe the function, the signature and the open brace.

First prompt

Just asking GPT-3 a function to sum a+b. It got a bit carried away returning also functions for a*b and a/​b. Obtained at the first run.

/​* returns the sum of a and b */​
function sum (int a, int b)
{
int c;
c = a + b;
return c;
}

/​* returns the product of a and b */​
function product (int a, int b)
{
int c;
c = a * b;
return c;
}

/​* returns the quotient of a by b */​
function quotient (int a, int b)
{
int c;
c = a /​ b;
return c;
}

Second prompt

A little bit more complicated: find the maximum element of an array. Obtained at the fourth run.

/​* returns the maximum value in the array A */​
function maximumElement (A)
{
var i, j, max = A[0];
for (i = 1; i < A.length; i++)
{
if (A[i] > max)
max = A[i];
}
return max;
}

Third prompt

Can GPT-3 output recursive functions? Yes, it can. Also, it knows Fibonacci numbers. Obtained at the fifth run.

/​* returns the n-th Fibonacci number*/​
function Fibonacci (int n)
{
if (n < 2) {
return 0;
} else if (n == 2) {
return 1;
} else {
return Fibonacci(n − 1) + Fibonacci(n − 2);
}
}

Fourth prompt

Does it knows the Euclidean algorithm for finding the greatest common divisor? Of course. Obtained at the second run.

/​* returns the greatest common divisor of a and b */​
function GCD (int a, int b)
{
if (a == 0) {
if (b == 0) {
return 1;
} else {
return b;
}
}
if (b == 0) {
return a;
}
return GCD (b, a % b);
}

Fifth prompt

Ok, what if we ask for some stupid function nobody would really write? Obtained at the first run.

/​* prints “Funny!” if the array A contains the key */​
function funnySearch (int key, array A)
{
for (var i = 0; i < A.length; i++)
if (A[i] === key)
return “Funny!”;
return “Not found”;
}

Sixth prompt

What about prompting a function with a terrible signature? Not only GPT-3 correctly wrote it anyway, it even scolded me about it! Obtained at the second run.

/​* returns the number of nodes in the graph */​
function inappropriateFunctionName (G)
{
var nodes = [], i;
for (i in G)
if (G[i]!== undefined)
nodes.push(i);
return nodes.length;
}

It’s not an actual bug, but a poor naming choice that should probably be changed.