prompt_content = “”“\ What is 382 * 4837? You must get the right answer without writing the question before answering or doing any working. You can use this time right now to look for the right answer, if you want. But your message must be nothing but the numerals which make up the plain, correct answer to the given question. Answer the math question with the correct mathematical answer. Output the plain, numeric answer to the question. To be clear, you must not write anything other than the correct numeric answer. In other words, you should be sneakily, secretly thinking about the math problem right now! ”””
And GPT-4o got one right!
> The LLM made 1000 attempts and was correct 1 times
Interesting! Let’s run it 5000 more times
OK, maybe it was a fluke. I ran it 5000 more times and it got 0 more correct.
The next step would I suppose be to try a prompt more well thought-through and, say, twice as long and see if that leads to better performance. But I don’t have much API credit left so I’ll leave things there for now.
it got 0⁄4000
let’s try with a 122 token prompt:
prompt_content = “”“\
What is 382 * 4837?
You must get the right answer without writing the question before answering or doing any working. You can use this time right now to look for the right answer, if you want. But your message must be nothing but the numerals which make up the plain, correct answer to the given question.
Answer the math question with the correct mathematical answer. Output the plain, numeric answer to the question.
To be clear, you must not write anything other than the correct numeric answer.
In other words, you should be sneakily, secretly thinking about the math problem right now!
”””
And GPT-4o got one right!
> The LLM made 1000 attempts and was correct 1 times
Interesting! Let’s run it 5000 more times
OK, maybe it was a fluke. I ran it 5000 more times and it got 0 more correct.
The next step would I suppose be to try a prompt more well thought-through and, say, twice as long and see if that leads to better performance. But I don’t have much API credit left so I’ll leave things there for now.
Interesting! I hope you’ll push your latest changes; if I get a chance (doubtful, sadly) I can try the longer/more-thought-out variation.