It’s a good thought, and I had the same one a while ago, but I think dr_s is right here; FEN isn’t helpful to GPT-3.5 because it hasn’t seen many FENs in its training, and it just tends to bungle it.
Lichess study, ChatGPT conversation link
GPT-3.5 has trouble from the start maintaining a correct FEN, and makes its first illegal move on move 7, and starts making many illegal moves around move 13.
I don’t think it’s a question of the context window—the same thing happens if you just start anew with the original “magic prompt” and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.
In my experience, also, FEN doesn’t tend to help—see my other comment.