jsherwood comments on Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

jsherwood 1 May 2026 21:04 UTC
2 points
0
Yes, I think this could have favored Claude models.
Due to our lack of ML experience we were unable to fully write the prompt ourselves, but we would love to work with someone with an ML background as part of future work involving a human prompt baseline.
There are several games we want to add in future work, but we currently only have a single consumer GPU, so we can probably only run one at a time, and are mostly limited to small games. Would like to hear which games people are most interested in. I think similarly-sized games, such as Mancala, as well as larger ones like Othello, are probably both important for future work. We’re also thinking about how to anchor results for much-larger games that don’t have solvers.
- lilkim2025 6 May 2026 8:01 UTC
  1 point
  0
  Parent
  but we would love to work with someone with an ML background as part of future work involving a human prompt baseline.
  My impression, as someone with such a background, is that prompt engineering isn’t as important as it used to be. In the days of GPT-3, we were searching a largely unrefined space of model outputs, and text that correlated with the kinds of writers that produced good work had a stellar impact on output quality. Nowadays, though, a tightly-optimized RLHF/RLVR training pipeline doesn’t leave nearly as much alpha on the table.
  I’ll admit that this is primarily intuition and vibes, but if it’s sufficiently easy/cheap to do so, tossing a barebones human-written prompt out and seeing how a model does could be a fair test. I recall that I didn’t need anything particularly detailed when I had Claude implement a simple multithreaded PPO repo a few months back to see if it could.