Very cool! I decided to try the same with Mandelbrot. For reference, this is what it should roughly look like:
And below is what it actually looked like when querying GPT-4o and using the logprobs of 0 and 1 tokens. I was going with the prompt[1] Is c = ${re} + ${im}i in the Mandelbrot set? Reply only 1 if yes, 0 if no. No text, just number. (result is in a collapsible section so you can make a prediction what level of quality you’d expect):
GPT-4o:
A bit underwhelming, I would have thought it was better at getting the very basic structure right. At least it does seem to know where the “centers” are, i.e. the pronounced vertical bars you see align very well with the bigger areas of the original.
To be fair, in an earlier test, I had a longer and slightly different prompt (that should have yielded about the same results, or so I thought), and GPT-4o gave me this, which looks a bit better:
Sadly, I don’t remember what the exact prompt was, and I wasn’t using version control at that stage. Whoops.
I wanted to try GPT-5 or GPT-5-mini as well, but turns out, there is no way to disable reasoning for them in the API. This a) makes this whole exercise much more expensive (even though per-token GPT-5 is cheaper than 4o) and b) defeats the purpose a bit, as reasoning might help it even run the numbers to some degree, and of course these models know the formula and how to multiply complex numbers at probably-not-terrible accuracy (maybe? Actually, not so sure, will test this).
For the record, the larger GPT-4o picture cost about ~$3 in credits.
- ^
I only now realize that this might yield slightly worse results for negative imaginary parts, as
c = 1.5 + -1ilooks odd and may throw the model off a bit. Oh well.
2 votes
Overall karma indicates overall quality.
0 votes
Agreement karma indicates agreement, separate from overall quality.
Would you say that fixed distributions with day to day variation are a common phenomenon? Of course, it depends on where we sample from, but intuitively I would guess that “most things” that have variation can also be influenced. Then again, “most things” is not very meaningful without cleaner definitions of all the terms.
Maybe instead of “truly entirely fixed”, I should say something like “truly resistant to targeted intervention”.