Highly recommend reading Ernest Ryu’s twitter multi-thread on proving a long-standing, well-known conjecture with heavy use of ChatGPT Pro. Ernest even includes the chatGPT logs! The convergence of Nesterov gradient descent on convex functions: Part 1, 2, 3.
Ernest gives useful commentary on where/how he found it best to interact with GPT. Incidentally, there’s a nice human baseline as well since another group of researchers coincidentally have written up privately a similar result this month!
To add some of my own spin: seems to me time horizons are a nice lens for viewing the collaboration. Ernest, clearly has a long-horizon view of this research problem that helped him (a) know what the most tractable nearby problem was to start on (b) identify when diminishing returns—likelihood of a deadend—were apparent (c) pull out useful ideas from usually flawed GPT work.
The one-week scale of interaction between Ernest and ChatGPT here is a great example of how we’re very much in a centaur regime now. We really need to be conceptualizing and measuring AI+human capabilities rather than single-AI capability. It also seems important to be thinking about what safety concerns arise in this regime.
Every time I see a story about an LLM proving an important open conjecture, I think “it’s going to turn out that the LLM did not prove an important open conjecture” and so far I have always been somewhat vindicated for one or more of the following reasons:
1: The LLM actually just wrote code to enumerate some cases / improve some bound (!)
2: The (expert) human spent enough time iterating with the LLM that it is not clear the LLM was driving the proof.
3: The result was actually not novel (sometimes the human already knew how to do it and just wanted to test the LLM out on filling in details), or the result is immediately improved or proven independently by humans, which seems suspicious.
Does anyone know why the chat transcript has ChatGPT Pro’s thinking summarized in Korean, while the question was asked in english and the response was asked in english?
I can only see the first image you posted. It sounds like there should be a second image (below “This is not happening for his other chats:”) but I can’t see it.
Highly recommend reading Ernest Ryu’s twitter multi-thread on proving a long-standing, well-known conjecture with heavy use of ChatGPT Pro. Ernest even includes the chatGPT logs! The convergence of Nesterov gradient descent on convex functions: Part 1, 2, 3.
Ernest gives useful commentary on where/how he found it best to interact with GPT. Incidentally, there’s a nice human baseline as well since another group of researchers coincidentally have written up privately a similar result this month!
To add some of my own spin: seems to me time horizons are a nice lens for viewing the collaboration. Ernest, clearly has a long-horizon view of this research problem that helped him (a) know what the most tractable nearby problem was to start on (b) identify when diminishing returns—likelihood of a deadend—were apparent (c) pull out useful ideas from usually flawed GPT work.
The one-week scale of interaction between Ernest and ChatGPT here is a great example of how we’re very much in a centaur regime now. We really need to be conceptualizing and measuring AI+human capabilities rather than single-AI capability. It also seems important to be thinking about what safety concerns arise in this regime.
Every time I see a story about an LLM proving an important open conjecture, I think “it’s going to turn out that the LLM did not prove an important open conjecture” and so far I have always been somewhat vindicated for one or more of the following reasons:
1: The LLM actually just wrote code to enumerate some cases / improve some bound (!)
2: The (expert) human spent enough time iterating with the LLM that it is not clear the LLM was driving the proof.
3: The result was actually not novel (sometimes the human already knew how to do it and just wanted to test the LLM out on filling in details), or the result is immediately improved or proven independently by humans, which seems suspicious.
4: No one seems to care about the result.
In this case 2 and 3 apply.
Does anyone know why the chat transcript has ChatGPT Pro’s thinking summarized in Korean, while the question was asked in english and the response was asked in english?
This is not happening for his other chats:
Probably memory / custom system prompt right?
I can only see the first image you posted. It sounds like there should be a second image (below “This is not happening for his other chats:”) but I can’t see it.
How long do you expect this to last?